Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add multi-column support to UpdateBy RollingFormula() operator #6143

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

lbooker42
Copy link
Contributor

No description provided.

@lbooker42 lbooker42 added this to the 0.37.0 milestone Sep 26, 2024
@lbooker42 lbooker42 self-assigned this Sep 26, 2024
Copy link
Contributor

@malhotrashivam malhotrashivam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review Part 1: All replicated files and files with smaller changes

Comment on lines 32 to 33
// noinspection unchecked
result = new ObjectRingBufferVectorWrapper<T>((ObjectRingBuffer<T>) buffer, (Class<T>) componentType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check to get full test coverage.

@lbooker42 lbooker42 changed the title Add multi-column support to UpdateBy RollingFormula() operator feat: add multi-column support to UpdateBy RollingFormula() operator Oct 9, 2024
formulaInputSource.set(new ObjectRingBufferVectorWrapper(windowValues, inputVectorType));
// noinspection rawtypes
formulaInputSource.set(new ObjectRingBufferVectorWrapper(windowValues, inputComponentType));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't fully understand this change, can you please explain?
Also, can we use ByteRingBufferVectorWrapper here since we would have byte values internally? Please correct me if I am mistaken or if this is out of scope for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't fully understand this change, can you please explain?

Fixing an earlier bug. I'm trying to track the fundamental component type of the input Object column to the Vector created from it.

Also, can we use ByteRingBufferVectorWrapper here since we would have byte values internally? Please correct me if I am mistaken or if this is out of scope for this PR.

This matches AggFormula in creating ObjectVector when assembling groups of Boolean input. The functions in Basic.java expect this input.

Comment on lines 30 to 31
public abstract String formula();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change from List to a single value require any documentation update? Also, is this a breaking change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting multiple formula was never exposed to the user so this change is entirely internal.


private final String formula;
private final TableDefinition tableDef;
private final QueryCompilerRequestProcessor compilationProcessor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't follow where the formula and compilationProcessor are being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are used in RollingFormulaMultiColumnOperator#copy, but that method is used to support PartitionedTable proxy() calls and can be made more efficient (by supplying the compiled formula column, eg.).

@lbooker42
Copy link
Contributor Author

lbooker42 commented Oct 17, 2024

Python support for this is incomplete, need to update function signatures and documentation, plus include examples.
Have added this support in the latest commit.

@@ -40,6 +40,8 @@
*/
public abstract class UpdateByOperator {
protected final MatchPair pair;

// Input columns for this operator. Must be dynamic to support discovery (e.g. formula columns).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You said this is OBE.

Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good. We can take a more thorough/complete pass next week if needed, but I don't want to duplicate effort too much. Do want to look at things after refactoring to rely more on SelectColumn and Selectable.

@deephaven deephaven deleted a comment from github-actions bot Oct 23, 2024
Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need to look at the operator code (base and multi). Deferring Python.

@@ -147,7 +155,7 @@ public boolean add(char e) {
* @param count the minimum number of empty entries in the buffer after this call
* @throws UnsupportedOperationException when {@code growable} is {@code false} and buffer is full
*/
public void ensureRemaining(int count) {
public void ensureRemaining(final int count) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More opportunities to add @Override to this code.

public void clear() {
tail = head = 0;
// region object-bulk-clear
Arrays.fill(storage, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there are an argument for filling less, since we knew the head and tail?

/**
* Create a new {@link RingBufferWindowConsumer} for the given buffer.
*
* @param buffer the buffer to manage and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete

/**
* Push {@code count} values from the input chunk into the ring buffer, beginning at {@code index}.
*
* @param index the index to push the value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param index the index to push the value
* @param index the index of the first value to push

* Push {@code count} values from the input chunk into the ring buffer, beginning at {@code index}.
*
* @param index the index to push the value
* @param count the count of the value to push
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param count the count of the value to push
* @param count the count of values to push

* reverse window, so calling this with {@code revTicks = 1} will simply return the current row. Specifying
* {@code revTicks = 10} will include the previous 9 rows to this one and this row for a total of 10 rows.
* <p>
* The provided {@code formula} should specify the output column name Some examples of formula are:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply to all similar methods.

Suggested change
* The provided {@code formula} should specify the output column name Some examples of formula are:
* The provided {@code formula} should specify the output column name. Some examples of formula are:

@@ -87,11 +153,4 @@ final void checkFormula() {
throw new IllegalArgumentException("formula must not be empty");
}
}

@Value.Check
final void checkParamToken() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could do a Selectable.parse up front to determine whether the output column name is specified when param token is missing. Maybe redundant, but that's sort of in the spirit of our immutables conventions. Could also conceivably make that selectable() a lazily-computed, cached field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be sure you have:

  1. Tests for 0, 1, 2, and 3+ inputs using the new expected syntax.
  2. Tests that you break with >1 input if paramToken is supplied.

vectorColumnNameMap = new HashMap<>();
columnDefinitionMap.forEach((key, value) -> {
final ColumnDefinition<?> columnDef = ColumnDefinition.fromGenericType(
key, VectorFactory.forElementType(value.getDataType()).vectorType());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add component type (that is, value.getDataType()). I wonder if that means we need more testing for Object inputs.

}

// Get the input column names and data types from the formula.
final String[] inputColumnNames =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should validate that getColumnArrays() returns an empty list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants